H2N2V1, Main, Exploration, bibRecord, 000F68

Clustering the Normalized Compression Distance for Influenza Virus Data

Identifieur interne : 000F68 ( Main/Exploration ); précédent : 000F67; suivant : 000F69

Clustering the Normalized Compression Distance for Influenza Virus Data

Auteurs : Kimihito Ito [Japon] ; Thomas Zeugmann [Japon] ; Yu Zhu [Japon]

Source :

Lecture Notes in Computer Science [ 0302-9743 ]

RBID : ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08

Abstract

Abstract: The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines.

Url:

https://api.istex.fr/ark:/67375/HCB-2MQPN5CV-Q/fulltext.pdf

DOI: 10.1007/978-3-642-12476-1_9

Affiliations:

Japon

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000E18
to stream Istex, to step Curation: 000E18
to stream Istex, to step Checkpoint: 000235
to stream Main, to step Merge: 000F76
to stream Main, to step Curation: 000F68

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Clustering the Normalized Compression Distance for Influenza Virus Data</title>
<author><name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
</author>
<author><name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
</author>
<author><name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-12476-1_9</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-2MQPN5CV-Q/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000E18</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000E18</idno>
<idno type="wicri:Area/Istex/Curation">000E18</idno>
<idno type="wicri:Area/Istex/Checkpoint">000235</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000235</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Ito K:clustering:the:normalized</idno>
<idno type="wicri:Area/Main/Merge">000F76</idno>
<idno type="wicri:Area/Main/Curation">000F68</idno>
<idno type="wicri:Area/Main/Exploration">000F68</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Clustering the Normalized Compression Distance for Influenza Virus Data</title>
<author><name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Research Center for Zoonosis Control, Hokkaido University, N-20, W-10 Kita-ku, 001-0020, Sapporo</wicri:regionArea>
<wicri:noRegion>Sapporo</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author><name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo, 060-0814</wicri:regionArea>
<wicri:noRegion>060-0814</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author><name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo, 060-0814</wicri:regionArea>
<wicri:noRegion>060-0814</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Japon</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines.</div>
</front>
</TEI>
<affiliations><list><country><li>Japon</li>
</country>
</list>
<tree><country name="Japon"><noRegion><name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
</noRegion>
<name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F68 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F68 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08
   |texte=   Clustering the Normalized Compression Distance for Influenza Virus Data
}}

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021

	Serveur d'exploration H2N2
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration H2N2

Clustering the Normalized Compression Distance for Influenza Virus Data

Clustering the Normalized Compression Distance for Influenza Virus Data

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri